Parallel and comparable corpora: what are they up to?
نویسنده
چکیده
With ever increasing international exchange and accelerated globalisation, translation and contrastive studies are more popular than ever. As part of this new wave of research on translation and contrastive studies, corpora, and multilingual corpora in particular, have a prominent role. In this chapter, we will illustrate the value of parallel and comparable corpora to translation and contrastive studies.
منابع مشابه
استخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملRepetition and Language Models and Comparable Corpora
I will discuss a couple of non-standard features that I believe could be useful for working with comparable corpora. Dotplots have been used in biology to find interesting DNA sequences. Biology is interested in ordered matches, which show up as (possibly broken) diagonals in dotplots. Information Retrieval is more interested in unordered matches (e.g., cosine similarity), which show up as squa...
متن کاملHybrid Parallel Sentence Mining from Comparable Corpora
Mining for parallel sentences in comparable corpora is much more difficult than aligning sentences in parallel corpora. Sentence alignment in parallel corpora usually exploits simple empirical evidence (turned into assumptions) such as (i) the length of a sentence is proportional with the length of its translation and (ii) the discourse flow is necessarily the same in both parts of the bi-text ...
متن کاملLooking for Transliterations in a Trilingual English, French and Japanese Specialised Comparable Corpus
Transliterations and cognates have been shown to be useful in the case of bilingual extraction from parallel corpora. Observation of transliterations in a trilingual English, French and Japanese specialised comparable corpus reveals evidences that they are likely to be used with comparable corpora too, since they are an important and relevant part of the common vocabulary, but they also yield l...
متن کاملParallel Sentence Extraction from Comparable Corpora with Neural Network Features
Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for MT. In this paper, we exploit the neural network features acquired from neural MT for parallel sentence extraction. We observe significant improveme...
متن کامل